236 research outputs found
Estimating the Area under a Receiver Operating Characteristic Curve For Repeated Measures Design
The receiver operating characteristic (ROC) curve is widely used for diagnosing as well as for judging the discrimination ability of different statistical models. Although theories about ROC curves have been established and computation methods and computer software are available for cross-sectional design, limited research for estimating ROC curves and their summary statistics has been done for repeated measure designs, which are useful in many applications, such as biological, medical and health services research. Furthermore, there is no published statistical software available that can generate ROC curves and calculate summary statistics of the area under a ROC curve for data from a repeated measures design. Using generalized linear mixed model (GLMM), we estimate the predicted probabilities of the positivity of a disease or condition, and the estimated probability is then used as a bio-marker for constructing the ROC curve and computing the area under the curve. The area under a ROC curve is calculated using the Wilcoxon non-parametric approach by comparing the predicted probability of all discordant pairs of observations. The ROC curve is constructed by plotting a series of pairs of true positive rate (sensitivity) and false positive rate (1- specificity) calculated from varying cuts of positivity escalated by increments of 0.005 in predicted probability. The computation software is written in SAS/IML/MACRO v8 and can be executed in any computer that has a working SAS v8 system with SAS/IML/MACRO.
Sample Size Calculation and Power Analysis of Time-Averaged Difference
Little research has been done on sample size and power analysis under repeated measures design. With detailed derivation, we have shown sample size calculation and power analysis equations for timeaveraged difference to allow unequal sample sizes between two groups for both continuous and binary measures and explored the relative importance of number of unique subjects and number of repeated measurements within each subject on statistical power through simulation
Estimating the Area under a Receiver Operating Characteristic Curve For Repeated Measures Design
The receiver operating characteristic (ROC) curve is widely used for diagnosing as well as for judging the discrimination ability of different statistical models. Although theories about ROC curves have been established and computation methods and computer software are available for cross-sectional design, limited research for estimating ROC curves and their summary statistics has been done for repeated measure designs, which are useful in many applications, such as biological, medical and health services research. Furthermore, there is no published statistical software available that can generate ROC curves and calculate summary statistics of the area under a ROC curve for data from a repeated measures design. Using generalized linear mixed model (GLMM), we estimate the predicted probabilities of the positivity of a disease or condition, and the estimated probability is then used as a bio-marker for constructing the ROC curve and computing the area under the curve. The area under a ROC curve is calculated using the Wilcoxon non-parametric approach by comparing the predicted probability of all discordant pairs of observations. The ROC curve is constructed by plotting a series of pairs of true positive rate (sensitivity) and false positive rate (1- specificity) calculated from varying cuts of positivity escalated by increments of 0.005 in predicted probability. The computation software is written in SAS/IML/MACRO v8 and can be executed in any computer that has a working SAS v8 system with SAS/IML/MACRO
Learn from Yesterday: A Semi-Supervised Continual Learning Method for Supervision-Limited Text-to-SQL Task Streams
Conventional text-to-SQL studies are limited to a single task with a
fixed-size training and test set. When confronted with a stream of tasks common
in real-world applications, existing methods struggle with the problems of
insufficient supervised data and high retraining costs. The former tends to
cause overfitting on unseen databases for the new task, while the latter makes
a full review of instances from past tasks impractical for the model, resulting
in forgetting of learned SQL structures and database schemas. To address the
problems, this paper proposes integrating semi-supervised learning (SSL) and
continual learning (CL) in a stream of text-to-SQL tasks and offers two
promising solutions in turn. The first solution Vanilla is to perform
self-training, augmenting the supervised training data with predicted
pseudo-labeled instances of the current task, while replacing the full volume
retraining with episodic memory replay to balance the training efficiency with
the performance of previous tasks. The improved solution SFNet takes advantage
of the intrinsic connection between CL and SSL. It uses in-memory past
information to help current SSL, while adding high-quality pseudo instances in
memory to improve future replay. The experiments on two datasets shows that
SFNet outperforms the widely-used SSL-only and CL-only baselines on multiple
metrics.Comment: Accepted by AAAI-202
A method for analyzing censored survival phenotype with gene expression data
<p>Abstract</p> <p>Background</p> <p>Survival time is an important clinical trait for many disease studies. Previous works have shown certain relationship between patients' gene expression profiles and survival time. However, due to the censoring effects of survival time and the high dimensionality of gene expression data, effective and unbiased selection of a gene expression signature to predict survival probabilities requires further study.</p> <p>Method</p> <p>We propose a method for an integrated study of survival time and gene expression. This method can be summarized as a two-step procedure: in the first step, a moderate number of genes are pre-selected using correlation or liquid association (LA). Imputation and transformation methods are employed for the correlation/LA calculation. In the second step, the dimension of the predictors is further reduced using the modified sliced inverse regression for censored data (censorSIR).</p> <p>Results</p> <p>The new method is tested via both simulated and real data. For the real data application, we employed a set of 295 breast cancer patients and found a linear combination of 22 gene expression profiles that are significantly correlated with patients' survival rate.</p> <p>Conclusion</p> <p>By an appropriate combination of feature selection and dimension reduction, we find a method of identifying gene expression signatures which is effective for survival prediction.</p
Learning Effective NeRFs and SDFs Representations with 3D Generative Adversarial Networks for 3D Object Generation: Technical Report for ICCV 2023 OmniObject3D Challenge
In this technical report, we present a solution for 3D object generation of
ICCV 2023 OmniObject3D Challenge. In recent years, 3D object generation has
made great process and achieved promising results, but it remains a challenging
task due to the difficulty of generating complex, textured and high-fidelity
results. To resolve this problem, we study learning effective NeRFs and SDFs
representations with 3D Generative Adversarial Networks (GANs) for 3D object
generation. Specifically, inspired by recent works, we use the efficient
geometry-aware 3D GANs as the backbone incorporating with label embedding and
color mapping, which enables to train the model on different taxonomies
simultaneously. Then, through a decoder, we aggregate the resulting features to
generate Neural Radiance Fields (NeRFs) based representations for rendering
high-fidelity synthetic images. Meanwhile, we optimize Signed Distance
Functions (SDFs) to effectively represent objects with 3D meshes. Besides, we
observe that this model can be effectively trained with only a few images of
each object from a variety of classes, instead of using a great number of
images per object or training one model per class. With this pipeline, we can
optimize an effective model for 3D object generation. This solution is one of
the final top-3-place solutions in the ICCV 2023 OmniObject3D Challenge
Switching of easy-axis to easy-plane anisotropy in cobalt(ii) complexes
A tetranuclear cubane-type complex [Co4(ntfa)4(CH3O)4(CH3OH)4] (1) with a {Co4O4} core, and a mononuclear complex [Co(ntfa)2(CH3OH)2] (2) have been rationally obtained by adjusting the ratio of the β-diketonate and Co(II) ions, with the synthetic processes being monitored by in situ microcalorimetry. Then, following synthetic conditions to obtain 2, but using three distinct N-donor coligands - 2,2'-bipyridyl (bpy), 6,6'-dimethyl-2,2'-bipyridyl (6,6-(CH3)2-bpy) and 5,5'-dimethyl-2,2'-bipyridyl (5,5-(CH3)2-bpy) - three novel mononuclear complexes have been obtained, [Co(ntfa)2(bpy)2] (3), [Co(ntfa)2(6,6-(CH3)2- bpy)2] (4) and [Co(ntfa)2(5,5-(CH3)2-bpy)2] (5). The introduction of different capping coligands - as singlecrystal X-ray crystallography ascertains - fine-tunes the structures, with changes in both the distortion degree of the coordination geometry and the intermolecular interactions, which have a direct impact on the magnetic properties of these complexes. Magnetic investigations reveal field-induced single-ion magnet behavior in all complexes with distinct energy barriers (Ueff) −39.06 (1), 36.65 (2), 36.32 (3), 28.26 (4) and 15.85 K (5). Magnetic experiments together with HF-EPR measurements and theoretical calculations demonstrate that 2 features easy-axis magnetic anisotropy (D = −60.48 cm−1), whereas 3-5 show easy-plane magnetic anisotropies − D = +70.77 cm−1 for 3, +35.71 cm−1 for 4, and +51.28 cm−1 for 5. To our knowledge, such reversal of anisotropic nature driven by coligands is unprecedented
- …